183 research outputs found

    Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model

    Get PDF
    It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes

    Adaptive Evolution of Conserved Noncoding Elements in Mammals

    Get PDF
    Conserved noncoding elements (CNCs) are an abundant feature of vertebrate genomes. Some CNCs have been shown to act as cis-regulatory modules, but the function of most CNCs remains unclear. To study the evolution of CNCs, we have developed a statistical method called the “shared rates test” to identify CNCs that show significant variation in substitution rates across branches of a phylogenetic tree. We report an application of this method to alignments of 98,910 CNCs from the human, chimpanzee, dog, mouse, and rat genomes. We find that ∼68% of CNCs evolve according to a null model where, for each CNC, a single parameter models the level of constraint acting throughout the phylogeny linking these five species. The remaining ∼32% of CNCs show departures from the basic model including speed-ups and slow-downs on particular branches and occasionally multiple rate changes on different branches. We find that a subset of the significant CNCs have evolved significantly faster than the local neutral rate on a particular branch, providing strong evidence for adaptive evolution in these CNCs. The distribution of these signals on the phylogeny suggests that adaptive evolution of CNCs occurs in occasional short bursts of evolution. Our analyses suggest a large set of promising targets for future functional studies of adaptation

    Poultry Genomics Puts Meat on the Table

    Get PDF
    If the faithful had any concerns that their 'model' has not added a lot of value to the information being mined from the human genome, it was not evident at the meeting's end. Rather, I sensed hope and optimism and a clear plan as to what should come next. On the 'to do' list are completion of the genome sequence (in particular the sex chromosomes), creation of a chick atlas of development and a MOD, as well as other subjects. The plan is to hold a CSHL meeting every 2 years (keep an eye on the AvianNet www site for news: www.chicken-genome.org or CSHL: http://meetings.cshl.edu/meetings/chick05.shtml for the next meeting on 7-10 May 2006) to focus on Genome Biology, and to alternate this with a meeting at another location outside of the USA to focus on the Biology of Birds. Claudio Stern (c.sternυcl.ac.uk) will host such a meeting in 2007 in Barcelona, Spain, with a major focus on Development, the Immune System and Evolutionary Biology. Dave Burt also suggested that we will search for support of graduate students and post-docs to participate in future meetings (so any sponsors interested let him know). Even as concerns remain about losing genetic stocks that helped the poultry genetics community make significant contributions to vertebrate biology, participants felt that there was strong interest in the continued use of the chicken in comparative biology. It was clear that the draft of the chicken sequence is definitely just 'the end of the beginning', if that

    Transcription Factor Map Alignment of Promoter Regions

    Get PDF
    We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments

    Role of APOBEC3 in Genetic Diversity among Endogenous Murine Leukemia Viruses

    Get PDF
    The ability of human and murine APOBECs (specifically, APOBEC3) to inhibit infecting retroviruses and retrotransposition of some mobile elements is becoming established. Less clear is the effect that they have had on the establishment of the endogenous proviruses resident in the human and mouse genomes. We used the mouse genome sequence to study diversity and genetic traits of nonecotropic murine leukemia viruses (polytropic [Pmv], modified polytropic [Mpmv], and xenotropic [Xmv] subgroups), the best-characterized large set of recently integrated proviruses. We identified 49 proviruses. In phylogenetic analyses, Pmvs and Mpmvs were monophyletic, whereas Xmvs were divided into several clades, implying a greater number of replication cycles between the integration events. Four distinct primer binding site types (Pro, Gln1, Gln2 and Thr) were dispersed within the phylogeny, indicating frequent mispriming. We analyzed the frequency and context of G-to-A mutations for the role of mA3 in formation of these proviruses. In the Pmv and Mpmv (but not Xmv) groups, mutations attributable to mA3 constituted a large fraction of the total. A significant number of nonsense mutations suggests the absence of purifying selection following mutation. A strong bias of G-to-A relative to C-to-T changes was seen, implying a strand specificity that can only have occurred prior to integration. The optimal sequence context of G-to-A mutations, TTC, was consistent with mA3. At least in the Pmv group, a significant 5′ to 3′ gradient of G-to-A mutations was consistent with mA3 editing. Altogether, our results for the first time suggest mA3 editing immediately preceding the integration event that led to retroviral endogenization, contributing to inactivation of infectivity

    On the Origin and Evolution of Vertebrate Olfactory Receptor Genes: Comparative Genome Analysis Among 23 Chordate Species

    Get PDF
    Olfaction is a primitive sense in organisms. Both vertebrates and insects have receptors for detecting odor molecules in the environment, but the evolutionary origins of these genes are different. Among studied vertebrates, mammals have ∼1,000 olfactory receptor (OR) genes, whereas teleost fishes have much smaller (∼100) numbers of OR genes. To investigate the origin and evolution of vertebrate OR genes, I attempted to determine near-complete OR gene repertoires by searching whole-genome sequences of 14 nonmammalian chordates, including cephalochordates (amphioxus), urochordates (ascidian and larvacean), and vertebrates (sea lamprey, elephant shark, five teleost fishes, frog, lizard, and chicken), followed by a large-scale phylogenetic analysis in conjunction with mammalian OR genes identified from nine species. This analysis showed that the amphioxus has >30 vertebrate-type OR genes though it lacks distinctive olfactory organs, whereas all OR genes appear to have been lost in the urochordate lineage. Some groups of genes (θ, κ, and λ) that are phylogenetically nested within vertebrate OR genes showed few gene gains and losses, which is in sharp contrast to the evolutionary pattern of OR genes, suggesting that they are actually non-OR genes. Moreover, the analysis demonstrated a great difference in OR gene repertoires between aquatic and terrestrial vertebrates, reflecting the necessity for the detection of water-soluble and airborne odorants, respectively. However, a minor group (β) of genes that are atypically present in both aquatic and terrestrial vertebrates was also found. These findings should provide a critical foundation for further physiological, behavioral, and evolutionary studies of olfaction in various organisms

    Regional differences in recombination hotspots between two chicken populations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although several genetic linkage maps of the chicken genome have been published, the resolution of these maps is limited and does not allow the precise identification of recombination hotspots. The availability of more than 3.2 million SNPs in the chicken genome and the recent advances in high throughput genotyping techniques enabled us to increase marker density for the construction of a high-resolution linkage map of the chicken genome. This high-resolution linkage map allowed us to study recombination hotspots across the genome between two chicken populations: a purebred broiler line and a broiler × broiler cross. In total, 1,619 animals from the two different broiler populations were genotyped with 17,790 SNPs.</p> <p>Results</p> <p>The resulting linkage map comprises 13,340 SNPs. Although 360 polymorphic SNPs that had not been assigned to a known chromosome on chicken genome build WASHUC2 were included in this study, no new linkage groups were found. The resulting linkage map is composed of 31 linkage groups, with a total length of 3,054 cM for the sex-average map of the combined population. The sex-average linkage map of the purebred broiler line is 686 cM smaller than the linkage map of the broiler × broiler cross.</p> <p>Conclusions</p> <p>In this study, we present a linkage map of the chicken genome at a substantially higher resolution than previously published linkage maps. Regional differences in recombination hotspots between the two mapping populations were observed in several chromosomes near the telomere of the p arm; the sex-specific analysis revealed that these regional differences were mainly caused by female-specific recombination hotspots in the broiler × broiler cross.</p

    Independent Mammalian Genome Contractions Following the KT Boundary

    Get PDF
    Although it is generally accepted that major changes in the earth's history are significant drivers of phylogenetic diversification and extinction, such episodes may also have long-lasting effects on genomic architecture. Here we show that widespread reductions in genome size have occurred in multiple lineages of mammals subsequent to the Cretaceous–Tertiary (KT) boundary, whereas there is no evidence for such changes in other vertebrate, invertebrate, or land plant lineages. Although the mechanisms remain unclear, such shifts in mammalian genome evolution may be a consequence of an increase in the efficiency of selection against excess DNA resulting from post-KT population size expansions. Independent historical changes in genome architecture in diverse lineages raise a significant challenge to the idea that genome size is finely tuned to achieve adaptive phenotypic modifications and suggest that attempts to use phylogenetic analysis to infer ancestral genome sizes may be problematical

    Genomics and proteomics of vertebrate cholesterol ester lipase (LIPA) and cholesterol 25-hydroxylase (CH25H)

    Get PDF
    Cholesterol ester lipase (LIPA; EC 3.1.1.13) and cholesterol 25-hydroxylase (CH25H; EC 1.14.99.48) play essential role in cholesterol metabolism in the body by hydrolysing cholesteryl esters and triglycerides within lysosomes (LIPA) and catalysing the formation of 25-hydroxycholesterol from cholesterol (CH25H) which acts to repress cholesterol biosynthesis. Bioinformatic methods were used to predict the amino acid sequences, structures and genomic features of several vertebrate LIPA and CH25H genes and proteins, and to examine the phylogeny of vertebrate LIPA. Amino acid sequence alignments and predicted subunit structures enabled the identification of key sequences previously reported for human LIPA and CH25H and transmembrane structures for vertebrate CH25H sequences. Vertebrate LIPA and CH25H genes were located in tandem on all vertebrate genomes examined and showed several predicted transcription factor binding sites and CpG islands located within the 5′ regions of the human genes. Vertebrate LIPA genes contained nine coding exons, while all vertebrate CH25H genes were without introns. Phylogenetic analysis demonstrated the distinct nature of the vertebrate LIPA gene and protein family in comparison with other vertebrate acid lipases and has apparently evolved from an ancestral LIPA gene which predated the appearance of vertebrates

    A single amino acid substitution confers enhanced methylation activity of mammalian Dnmt3b on chromatin DNA

    Get PDF
    Dnmt3a and Dnmt3b are paralogous enzymes responsible for de novo DNA methylation but with distinguished biological functions. In mice, disruption of Dnmt3b but not Dnmt3a causes global DNA hypomethylation, especially in repetitive sequences, which comprise the large majority of methylated DNA in the genome. By measuring DNA methylation activity of Dnmt3a and Dnmt3b homologues from five species, we found that mammalian Dnmt3b possessed significantly higher methylation activity on chromatin DNA than Dnmt3a and non-mammalian Dnmt3b. Sequence comparison and mutagenesis experiments identified a single amino acid substitution (I662N) in mammalian Dnmt3b as being crucial for its high chromatin DNA methylation activity. Further mechanistic studies demonstrated this substitution markedly enhanced the binding of Dnmt3b to nucleosomes and hence increased the chromatin DNA methylation activity. Moreover, this substitution was crucial for Dnmt3b to efficiently methylate repetitive sequences, which increased dramatically in mammalian genomes. Consistent with our observation that Dnmt3b evolved more rapidly than Dnmt3a during the emergence of mammals, these results demonstrated that the I662N substitution in mammalian Dnmt3b conferred enhanced chromatin DNA methylation activity and contributed to functional adaptation in the epigenetic system
    corecore